Search Results: "Enrico Zini"

14 June 2021

Enrico Zini: Pipelining

This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience. Running actions on a server is nice, but a network round trip for each action is not very efficient. If I need to run a linear sequence of actions, I can stream them all to the server, and then read replies streamed from the server as they get executed. This technique is called pipelining and one can see it used, for example, in Redis, or Mitogen. Roles Ansible has the concept of "Roles" as a series of related tasks: I'll play with that. Here's an example role to install and setup fail2ban:
class Role(role.Role):
    def main(self):
        self.add(builtin.apt(
            name=["fail2ban"],
            state="present",
        ))
        self.add(builtin.copy(
            content=inline("""
                [postfix]
                enabled = true
                [dovecot]
                enabled = true
            """),
            dest="/etc/fail2ban/jail.local",
            owner="root",
            group="root",
            mode=0o644,
        ), name="configure fail2ban")
I prototyped roles as classes, with methods that push actions down the pipeline. If an action fails, all further actions for the same role won't executed, and will be marked as skipped. Since skipping is applied per-role, it means that I can blissfully stream actions for multiple roles to the server down the same pipe, and errors in one role will stop executing that role and not others. Potentially I can get multiple roles going with a single network round-trip:
#!/usr/bin/python3
import sys
from transilience.system import Mitogen
from transilience.runner import Runner
@Runner.cli
def main():
    system = Mitogen("my server", "ssh", hostname="server.example.org", username="root")
    runner = Runner(system)
    # Send roles to the server
    runner.add_role("general")
    runner.add_role("fail2ban")
    runner.add_role("prosody")
    # Run until all roles are done
    runner.main()
if __name__ == "__main__":
    sys.exit(main())
That looks like a playbook, using Python as glue rather than YAML. Decision making in roles Besides filing a series of actions, a role may need to take decisions based on the results of previous actions, or on facts discovered from the server. In that case, we need to wait until the results we need come back from the server, and then decide if we're done or if we want to send more actions down the pipe. Here's an example role that installs and configures Prosody:
from transilience import actions, role
from transilience.actions import builtin
from .handlers import RestartProsody
class Role(role.Role):
    """
    Set up prosody XMPP server
    """
    def main(self):
        self.add(actions.facts.Platform(), then=self.have_facts)
        self.add(builtin.apt(
            name=["certbot", "python-certbot-apache"],
            state="present",
        ), name="install support packages")
        self.add(builtin.apt(
            name=["prosody", "prosody-modules", "lua-sec", "lua-event", "lua-dbi-sqlite3"],
            state="present",
        ), name="install prosody packages")
    def have_facts(self, facts):
        facts = facts.facts  # Malkovich Malkovich Malkovich!
        domain = facts["domain"]
        ctx =  
            "ansible_domain": domain
         
        self.add(builtin.command(
            argv=["certbot", "certonly", "-d", f"chat. domain ", "-n", "--apache"],
            creates=f"/etc/letsencrypt/live/chat. domain /fullchain.pem"
        ), name="obtain chat certificate")
        with self.notify(RestartProsody):
            self.add(builtin.copy(
                content=self.template_engine.render_file("roles/prosody/templates/prosody.cfg.lua", ctx),
                dest="/etc/prosody/prosody.cfg.lua",
            ), name="write prosody configuration")
            self.add(builtin.copy(
                src="roles/prosody/templates/firewall-ruleset.pfw",
                dest="/etc/prosody/firewall-ruleset.pfw",
            ), name="write prosody firewall")
    # ...
This files some general actions down the pipe, with a hook that says: when the results of this action come back, run self.have_facts(). At that point, the role can use the results to build certbot command lines, render prosody's configuration from Jinja2 templates, and use the results to file further action down the pipe. Note that this way, while the server is potentially still busy installing prosody, we're already streaming prosody's configuration to it. If anything goes wrong with the installation of prosody's package, the role will be marked as failed and all further actions of the same role, even those filed by have_facts() will be skipped. Notify and handlers In the previous example self.notify() also appears: that's my attempt to model the equivalent of Ansible's handlers. If any of the actions inside the with produce changes, then the RestartProsody role will be executed, potentially filing more actions ad the end of the playbook. The runner will take care of collecting all the triggered role classes in a set, which discards duplicates, and then running the main() method of all resulting roles, which will cause more actions to be filed down the pipe. Action conditions Sometimes some actions are only meaningful as consequences of other actions. Let's take, for example, enabling buster-backports as an extra apt source:
        a = self.add(builtin.copy(
            owner="root",
            group="root",
            mode=0o644,
            dest="/etc/apt/sources.list.d/debian-buster-backports.list",
            content="deb [arch=amd64] https://mirrors.gandi.net/debian/ buster-backports main contrib",
        ), name="enable backports")
        self.add(builtin.apt(
            update_cache=True
        ), name="update after enabling backports",
           # Run only if the previous copy changed anything
           when= a: ResultState.CHANGED ,
        )
Here we want to update Apt's cache, which is a slow operation, only after we actually write /etc/apt/sources.list.d/debian-buster-backports.list. If the file was already there from a previous run, we can skip downloading the new package lists. The when= attributes adds an annotation to the action that is sent town the pipeline, that says that it should only be run if the state of a previous action matches the given one. In this case, when on the remote it's the turn of "update after enabling backports", it gets skipped unless the state of the previous "enable backports" action is CHANGED. Effects of pipelining I ported enough of Ansible's modules to be able to run the provisioning scripts of my VPS entirely via ansible. This is the playbook run as plain Ansible:
$ time ansible-playbook vps.yaml
[...]
servername       : ok=55   changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
real    2m10.072s
user    0m33.149s
sys 0m10.379s
This is the same playbook run with Ansible speeded up via the Mitogen backend, which makes Ansible more bearable:
$ export ANSIBLE_STRATEGY=mitogen_linear
$ time ansible-playbook vps.yaml
[...]
servername       : ok=55   changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
real    0m24.428s
user    0m8.479s
sys 0m1.894s
This is the same playbook ported to Transilience:
$ time ./provision
[...]
real    0m2.585s
user    0m0.659s
sys 0m0.034s
Doing nothing went from 2 minutes down to 3 seconds! That's the kind of running time that finally makes me comfortable with maintaining my VPS by editing the playbook only, and never logging in to mess with the system configuration by hand! Next steps I'm quite happy with what I have: I can now maintain my VPS with a simple script with quick iterative cycles. I might use it to develop new playbooks, and port them to ansible only when they're tested and need to be shared with infrastructure that needs to rely on something more solid and battle tested than a prototype provisioning system. I might also keep working on it as I have more interesting ideas that I'd like to try. I feel like Ansible reached some architectural limits that are hard to overcome without a major redesign, and are in many way hardcoded in its playbook configuration. It's nice to be able to try out new designs without that baggage. I'd love it if even just the library of Transilience actions could grow, and gain widespread use. Ansible modules standardized a set of management operations, that I think became the way people think about system management, and should really be broadly available outside of Ansible. If you are interesting in playing with Transilience, such as: do get in touch or send a pull request! :) Next step: Reimagining Ansible variables.

Enrico Zini: Use ansible actions in a script

This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience. I like many of the modules provided with Ansible: they are convenient, platform-independent implementations of common provisioning steps. They'd be fantastic to have in a library that I could use in normal programs. This doesn't look easy to do with Ansible code as it is. Also, the code quality of various Ansible modules doesn't fit something I'd want in a standard library of cross-platform provisioning functions. Modeling Actions I want to keep the declarative, idempotent aspect of describing actions on a system. A good place to start could be a hierarchy of dataclasses that hold the same parameters as ansible modules, plus a run() method that performs the action:
@dataclass
class Action:
    """
    Base class for all action implementations.
    An Action is the equivalent of an ansible module: a declarative
    representation of an idempotent operation on a system.
    An Action can be run immediately, or serialized, sent to a remote system,
    run, and sent back with its results.
    """
    uuid: str = field(default_factory=lambda: str(uuid.uuid4()))
    result: Result = field(default_factory=Result)
    def summary(self):
        """
        Return a short text description of this action
        """
        return self.__class__.__name__
    def run(self, system: transilience.system.System):
        """
        Perform the action
        """
        self.result.state = ResultState.NOOP
I like that Ansible tasks have names, and I hate having to give names to trivial tasks like "Create directory /foo/bar", so I added a summary() method so that trivial tasks like that can take care of naming themselves. Dataclasses allow to introspect fields and annotate them with extra metadata, and together with docstrings, I can make actions reasonably self-documeting. I ported some of Ansible's modules over: see complete list in the git repository. Running Actions in a script With a bit of glue code I can now run Ansible-style functions from a plain Python script:
#!/usr/bin/python3
from transilience.runner import Script
script = Script()
for i in range(10):
    script.builtin.file(state="touch", path=f"/tmp/test i ")
Running Actions remotely Dataclasses have an asdict function that makes them trivially serializable. If their members stick to data types that can be serialized with Mitogen and the run implementation doesn't use non-pure, non-stdlib Python modules, then I can trivially run actions on all sorts of remote systems using Mitogen:
#!/usr/bin/python3
from transilience.runner import Script
from transilience.system import Mitogen
script = Script(system=Mitogen("my server", "ssh", hostname="machine.example.org", username="user"))
for i in range(10):
    script.builtin.file(state="touch", path=f"/tmp/test i ")
How fast would that be, compared to Ansible?
$ time ansible-playbook test.yaml
[...]
real    0m15.232s
user    0m4.033s
sys 0m1.336s
$ time ./test_script
real    0m4.934s
user    0m0.547s
sys 0m0.049s
With a network round-trip for each single operation I'm already 3x faster than Ansible, and it can run on nspawn containers, too! I always wanted to have a library of ansible modules useable in normal scripts, and I've always been angry with Ansible for not bundling their backend code in a generic library. Well, now there's the beginning of one! Sweet! Next step, pipelining.

Enrico Zini: My gripes with Ansible

This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience. Musing about Ansible I like infrastructure as code. I like to be able to represent an entire system as text files in a git repositories, and to be able to use that to recreate the system, from my Virtual Private Server, to my print server and my stereo, to build machines, to other kind of systems I might end up setting up. I like that the provisioning work I do on a machine can be self-documenting and replicable at will. The good For that I quite like Ansible, in principle: simple (in theory) YAML files describe a system in (reasonably) high-level steps, and it can be run on (almost) any machine that happens to have a simple Python interpreter installed. I also like many of the modules provided with Ansible: they are convenient, platform-independent implementations of common provisioning steps. They'd be fantastic to have in a library that I could use in normal programs. The bad Unfortunately, Ansible is slow. Running the playbook on my VPS takes about 3 whole minutes even if I'm just changing a line in a configuration file. This means that most of the time, instead of changing that line in the playbook and running it, to then figure out after 3 minutes that it was the wrong line, or I made a spelling mistake in the playbook, I end up logging into the server and editing in place. That defeats the whole purpose, but that level of latency between iterations is just unacceptable to me. The ugly I also think that Ansible has outgrown its original design, and the supposedly declarative, idempotent YAML has become a full declarative scripting language in disguise, whose syntax is extremely awkward and verbose. If I'm writing declarative descriptions, YAML is great. If I'm writing loops and conditionals, I want to write code, not templated YAML. I also keep struggling trying to use Ansible to provision chroots and nspawn containers. A personal experiment: Transilience There's another thing I like in Ansible: it's written in Python, which is a language I'm comfortable with. Compared to other platforms, it's one that I'm more likely to be able to control beyond being a simple user. What if I can port Ansible modules into a library of high-level provisioning functions, that I can just run via normal Python scripts? What if I can find a way to execute those scripts remotely and not just locally? I've started writing some prototype code, and the biggest problem is, of course, finding a name. Ansible comes from Ursula K. Le Guin's Hainish Cycle novels, where it is a device that allows its users to communicate near-instantaneously over interstellar distances. Traveling, however, is still constrained by the speed of light. Later in the same universe, the novels A Fisherman of the Inland Sea and The Shobies' Story, talk about experiments with instantaneous interstellar travel, as a science Ursula Le Guin called transilience:
Transilience: n. A leap across or from one thing to another [1913 Webster]
Transilience. I like everything about this name. Now that the hardest problem is solved, the rest is just a simple matter of implementation details.

9 June 2021

Enrico Zini: Ansible recurse and follow quirks

I'm reading Ansible's builtin.file sources for, uhm, reasons, and the use of follow stood out to my eyes. Reading on, not only that. I feel like the ansible codebase needs a serious review, at least in essential core modules like this one. In the file module documentation it says:
This flag indicates that filesystem links, if they exist, should be followed.
In the recursive_set_attributes implementation instead, follow means "follow symlinks to directories", but if a symlink to a file is found, it does not get followed, kind of. What happens is that ansible will try to change the mode of the symlink, which makes sense on some operating systems. And it does try to use lchmod if present. Buf if not, this happens:
# Attempt to set the perms of the symlink but be
# careful not to change the perms of the underlying
# file while trying
underlying_stat = os.stat(b_path)
os.chmod(b_path, mode)
new_underlying_stat = os.stat(b_path)
if underlying_stat.st_mode != new_underlying_stat.st_mode:
    os.chmod(b_path, stat.S_IMODE(underlying_stat.st_mode))
So it tries doing chmod on the symlink, and if that changed the mode of the actual file, switch it back. I would have appreciated a comment documenting on which systems a hack like this makes sense. As it is, it opens a very short time window in which a symlink attack can make a system file vulerable, and an exception thrown by the second stat will make it vulnerable permanently. What about follow following links during recursion: how does it avoid loops? I don't see a cache of (device, inode) pairs visited. Let's try:
fatal: [localhost]: FAILED! =>  "changed": false, "details": "maximum recursion depth exceeded", "gid": 1000, "group": "enrico", "mode": "0755", "msg": "mode must be in octal or symbolic form", "owner": "enrico", "path": "/tmp/test/test1", "size": 0, "state": "directory", "uid": 1000 
Ok, it, uhm, delegates handling that to the Python stack size. I guess it means that a ln -s .. foo in a directory that gets recursed will always fail the task. Fun! More quirks Turning a symlink into a hardlink is considered a noop if the symlink points to the same file:
---
- hosts: localhost
  tasks:
   - name: create test file
     file:
        path: /tmp/testfile
        state: touch
   - name: create test link
     file:
        path: /tmp/testlink
        state: link
        src: /tmp/testfile
   - name: turn it into a hard link
     file:
        path: /tmp/testlink
        state: hard
        src: /tmp/testfile
gives:
$ ansible-playbook test3.yaml
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
PLAY [localhost] ************************************************************************************************************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************************************************************************************************
ok: [localhost]
TASK [create test file] *****************************************************************************************************************************************************************************************************
changed: [localhost]
TASK [create test link] *****************************************************************************************************************************************************************************************************
changed: [localhost]
TASK [turn it into a hard link] *********************************************************************************************************************************************************************************************
ok: [localhost]
PLAY RECAP ******************************************************************************************************************************************************************************************************************
localhost                  : ok=4    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
More quirks Converting a directory into a hardlink should work, but it doesn't because unlink is used instead of rmdir:
---
- hosts: localhost
  tasks:
   - name: create test dir
     file:
        path: /tmp/testdir
        state: directory
   - name: turn it into a symlink
     file:
        path: /tmp/testdir
        state: hard
        src: /tmp/
        force: yes
gives:
$ ansible-playbook test4.yaml
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
PLAY [localhost] ************************************************************************************************************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************************************************************************************************
ok: [localhost]
TASK [create test dir] ******************************************************************************************************************************************************************************************************
changed: [localhost]
TASK [turn it into a symlink] ***********************************************************************************************************************************************************************************************
fatal: [localhost]: FAILED! =>  "changed": false, "gid": 1000, "group": "enrico", "mode": "0755", "msg": "Error while replacing: [Errno 21] Is a directory: b'/tmp/testdir'", "owner": "enrico", "path": "/tmp/testdir", "size": 0, "state": "directory", "uid": 1000 
PLAY RECAP ******************************************************************************************************************************************************************************************************************
localhost                  : ok=2    changed=1    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
More quirks This is hard to test, but it looks like if source and destination are hardlinks to the same inode numbers, but on different filesystems, the operation is considered a successful noop: https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/file.py#L821 It should probably be something like:
if (st1.st_dev, st1.st_ino) == (st2.st_dev, st2.st_ino):

8 June 2021

Enrico Zini: Mock syscalls with C++

I wrote and maintain some C++ code to stream high quantities of data as fast as possible, and I try to use splice and sendfile when available. The availability of those system calls varies at runtime according to a number of factors, and the code needs to be written to fall back to read/write loops depending on what the splice and sendfile syscalls say. The tricky issue is unit testing: since the code path chosen depends on the kernel, the test suite will test one path or the other depending on the machine and filesystems where the tests are run. It would be nice to be able to mock the syscalls, and replace them during tests, and it looks like I managed. First I made catalogues of the mockable syscalls I want to be able to mock. One with function pointers, for performance, and one with std::function, for flexibility:
/**
 * Linux versions of syscalls to use for concrete implementations.
 */
struct ConcreteLinuxBackend
 
    static ssize_t (*read)(int fd, void *buf, size_t count);
    static ssize_t (*write)(int fd, const void *buf, size_t count);
    static ssize_t (*writev)(int fd, const struct iovec *iov, int iovcnt);
    static ssize_t (*sendfile)(int out_fd, int in_fd, off_t *offset, size_t count);
    static ssize_t (*splice)(int fd_in, loff_t *off_in, int fd_out,
                             loff_t *off_out, size_t len, unsigned int flags);
    static int (*poll)(struct pollfd *fds, nfds_t nfds, int timeout);
    static ssize_t (*pread)(int fd, void *buf, size_t count, off_t offset);
 ;
/**
 * Mockable versions of syscalls to use for testing concrete implementations.
 */
struct ConcreteTestingBackend
 
    static std::function<ssize_t(int fd, void *buf, size_t count)> read;
    static std::function<ssize_t(int fd, const void *buf, size_t count)> write;
    static std::function<ssize_t(int fd, const struct iovec *iov, int iovcnt)> writev;
    static std::function<ssize_t(int out_fd, int in_fd, off_t *offset, size_t count)> sendfile;
    static std::function<ssize_t(int fd_in, loff_t *off_in, int fd_out,
                                 loff_t *off_out, size_t len, unsigned int flags)> splice;
    static std::function<int(struct pollfd *fds, nfds_t nfds, int timeout)> poll;
    static std::function<ssize_t(int fd, void *buf, size_t count, off_t offset)> pread;
    static void reset();
 ;
Then I converted the code to templates, parameterized on the catalogue class. Explicit template instantiation helps in making sure that one doesn't need to include template code in all sorts of places. Finally, I can have a RAII class for mocking:
/**
 * RAII mocking of syscalls for concrete stream implementations
 */
struct MockConcreteSyscalls
 
    std::function<ssize_t(int fd, void *buf, size_t count)> orig_read;
    std::function<ssize_t(int fd, const void *buf, size_t count)> orig_write;
    std::function<ssize_t(int fd, const struct iovec *iov, int iovcnt)> orig_writev;
    std::function<ssize_t(int out_fd, int in_fd, off_t *offset, size_t count)> orig_sendfile;
    std::function<ssize_t(int fd_in, loff_t *off_in, int fd_out,
                                 loff_t *off_out, size_t len, unsigned int flags)> orig_splice;
    std::function<int(struct pollfd *fds, nfds_t nfds, int timeout)> orig_poll;
    std::function<ssize_t(int fd, void *buf, size_t count, off_t offset)> orig_pread;
    MockConcreteSyscalls();
    ~MockConcreteSyscalls();
 ;
MockConcreteSyscalls::MockConcreteSyscalls()
    : orig_read(ConcreteTestingBackend::read),
      orig_write(ConcreteTestingBackend::write),
      orig_writev(ConcreteTestingBackend::writev),
      orig_sendfile(ConcreteTestingBackend::sendfile),
      orig_splice(ConcreteTestingBackend::splice),
      orig_poll(ConcreteTestingBackend::poll),
      orig_pread(ConcreteTestingBackend::pread)
 
 
MockConcreteSyscalls::~MockConcreteSyscalls()
 
    ConcreteTestingBackend::read = orig_read;
    ConcreteTestingBackend::write = orig_write;
    ConcreteTestingBackend::writev = orig_writev;
    ConcreteTestingBackend::sendfile = orig_sendfile;
    ConcreteTestingBackend::splice = orig_splice;
    ConcreteTestingBackend::poll = orig_poll;
    ConcreteTestingBackend::pread = orig_pread;
 
And here's the specialization to pretend sendfile and splice aren't available:
/**
 * Mock sendfile and splice as if they weren't available on this system
 */
struct DisableSendfileSplice : public MockConcreteSyscalls
 
    DisableSendfileSplice();
 ;
DisableSendfileSplice::DisableSendfileSplice()
 
    ConcreteTestingBackend::sendfile = [](int out_fd, int in_fd, off_t *offset, size_t count) -> ssize_t  
        errno = EINVAL;
        return -1;
     ;
    ConcreteTestingBackend::splice = [](int fd_in, loff_t *off_in, int fd_out,
                                        loff_t *off_out, size_t len, unsigned int flags) -> ssize_t  
        errno = EINVAL;
        return -1;
     ;
 
It's now also possible to reproduce in the test suite all sorts of system-related issues we might observe in production over time.

5 June 2021

Enrico Zini: Ansible blockinfile oddity

I was reading Ansible's blockinfile sources for, uhm, reasons, and the code flow looked a bit odd. So I checked what happens if a file has spurious block markers. Give this file:
$ cat /tmp/test.orig
line0
# BEGIN ANSIBLE MANAGED BLOCK
line1
# END ANSIBLE MANAGED BLOCK
line2
# END ANSIBLE MANAGED BLOCK
line3
# BEGIN ANSIBLE MANAGED BLOCK
line4
And this playbook:
$ cat test.yaml
---
- hosts: localhost
  tasks:
   - name: test blockinfile
     blockinfile:
        block: NEWLINE
        path: /tmp/test
You get this result:
$ cat /tmp/test
line0
# BEGIN ANSIBLE MANAGED BLOCK
line1
# END ANSIBLE MANAGED BLOCK
line2
# BEGIN ANSIBLE MANAGED BLOCK
NEWLINE
# END ANSIBLE MANAGED BLOCK
line4
I was hoping that I was reading the code incorrectly, but it turns out that Ansible's blockinfile matches the last pair of begin-end markers it finds, in whatever order it finds them.

21 April 2021

Enrico Zini: Python output buffering

Here's a little toy program that displays a message like a split-flap display:
#!/usr/bin/python3
import sys
import time
def display(line: str):
    cur = '0' * len(line)
    while True:
        print(cur, end="\r")
        if cur == line:
            break
        time.sleep(0.09)
        cur = "".join(chr(min(ord(c) + 1, ord(oc))) for c, oc in zip(cur, line))
    print()
message = " ".join(sys.argv[1:])
display(message.upper())
This only works if the script's stdout is unbuffered. Pipe the output through cat, and you get a long wait, and then the final string, without the animation. What is happening is that since the output is not going to a terminal, optimizations kick in that buffer the output and send it in bigger chunks, to make processing bulk I/O more efficient. I haven't found a good introductory explanation of buffering in Python's documentation. The details seem to be scattered in the io module documentation and they mostly assume that one is already familiar with concepts like unbuffered, line-buffered or block-buffered. The libc documentation has a good quick introduction that one can read to get up to speed. Controlling buffering in Python In Python, one can force a buffer flush with the flush() method of the output file descriptor, like sys.stdout.flush(), to make sure pending buffered output gets sent. Python's print() function also supports flush=True as an optional argument:
    print(cur, end="\r", flush=True)
If one wants to change the default buffering for a file descriptor, since Python 3.7 there's a convenient reconfigure() method, which can reconfigure line buffering only:
sys.stdout.reconfigure(line_buffering=True)
Otherwise, the technique is to reassign sys.stdout to something that has the behaviour one wants (code from this StackOverflow thread):
import io
# Python 3, open as binary, then wrap in a TextIOWrapper with write-through.
sys.stdout = io.TextIOWrapper(open(sys.stdout.fileno(), 'wb', 0), write_through=True)
If one needs all this to implement a progressbar, one should make sure to have a look at the progressbar module first.

24 March 2021

Enrico Zini: Stallman

When I hear Stallman saying "and I'm not planning to resign a second time", the only thing I can see is a dangerous person making a power move. I'll be wary of FSF from now on. For this and other reasons, I have signed this open letter.

13 March 2021

Enrico Zini: nspawn-runner and ansible

This post is part of a series about trying to setup a gitlab runner based on systemd-nspawn. I published the polished result as nspawn-runner on GitHub. New goal: make it easier to configure and maintain chroots. For example, it should be possible to maintain a rolling testing or sid chroot without the need to manually log into it to run apt upgrade. It should be also be easy to have multiple runners reasonably in sync by carrying around a few simple configuration files, representing the set of images available to the CI. Ideally, those configuration files could simply be one ansible playbook per chroot. nspawn-runner could have a 'chroot-maintenance' command that runs all the playbooks on their corresponding chroots, and that would be all I need. ansible and systemd-nspawn ansible being inadequate as usual, it still does not have a nspawn or machinectl connector, even though the need is there, and one can find requests, pull requests, and implementation attempts by all sorts of people, including me. However, I don't want to have nspawn-runner depend on random extra plugins. There's a machinectl become plugin available in ansible from buster-backports, but no matter how I read its scant documentation, looked around the internet, and tried all sorts of things, I couldn't manage to figure out what it is for. This said, simply using systemd-nspawn instead of chroot is quite trivial: use ansible_connection: chroot, set ansible_chroot_exe to this shellscript, and it just works, with things properly mounted, internet access, correct hostnames, and everything:
#!/bin/sh
chroot="$1"
shift
exec systemd-nspawn --console=pipe -qD "$chroot" -- "$@"
I guess that's a, uhm, plan, I guess? Running playbooks As an initial prototype, I made nspawn check the list of chroots in /var/lib/nspawn-runner, and for each directory found there, check if there's an equivalent .yaml or .yml file next to nspawn-runner. For each chroot+playbook combination, it creates an inventory with the right setup, including the custom chroot command, and runs ansible-playbook. As a prototype it works. I assume once it will see usage there's going to be feedback and fine tuning; meanwhile, I have the beginning of some automated maintenance for the CI chroots. Next step It would be nice to also have nspawn-runner create the chroots from configuration files if they are missing, so that a new runner can be deployed with a minimal effort, and it will proceed to generate all the images required in a single command. For this, I'd like to find a clean way to store the chroot creation command inside the playbooks, to keep just one configuration file per chroot. I'd also like to have it flexible enough to run debootstrap, as well as commands for different distributions. Time will tell. This is probably enough for study/design posts on my blog. Further updates will be in the issue tracker.

Enrico Zini: Decameron and content warnings

Giovanni Boccaccio's Decameron is a book in which there's a pandemic, and a group of seven young women and three young men decide to practice social distancing in a secluded villa just outside of Florence. During two weeks of lockdown, they structure their time wisely, and among other things they spend the evenings telling each other stories. The book is a collection of those stories. The stories vary from the erotic to the tragic, and may trigger some reader. Boccaccio is aware of that, so he starts each chapter with a one line summary that is explicitly intended as a content warning, to give the reader a choice in whether to read the story, or skip to the next one. Boccaccio explains this in the book itself:
However, those who read these tales can leave those they dislike and read those they like. I do not want to deceive anybody, and so all these tales bear written at the head a title explaining what they contain. (link to version in original language)
On top of the very relatable setting, the diversity of stories, the empathy in the narration, the gender inclusivity, and the respect for the reader, it's a pretty good book. It should even be out of copyright! It was written around year 1350.

Enrico Zini: nspawn-runner and btrfs

This post is part of a series about trying to setup a gitlab runner based on systemd-nspawn. I published the polished result as nspawn-runner on GitHub. systemd-nspawn has an interesting --ephemeral option that sets up temporary copy-on-write filesystem snapshots on filesystems that support it, like btrfs. Using copy on write means that one could perform maintenance on the source chroots, without disrupting existing CI sessions. btrfs and copy on write btrfs snapshots work on subvolumes. As I understand it, if one uses btrfs subvolume create instead of mkdir, what is inside the resulting directory is managed as a subvolume that can be snapshotted and managed in all sorts of interesting ways. I managed to delete a subvolume equally well with btrfs subvolume delete and with rm -r. btrfs subvolume snapshot src dst is similar to cp -a, but it makes a copy-on-write snapshot of a btrfs subvolume. If I change nspawn-runner to manage each chroot in its own subvolume, I should be able to build on all these features, and systemd-nspawn should be able to do that, too. There's a cute shortcut to migrate a subdirectory to a subvolume: create the subvolume, then use cp -r --reflink to populate the subvolume with the directory contents. systemd-nspawn and btrfs Passing -x/--ephemeral to systemd-nspawn makes it do all the transient copy-on-write work automatically:
# systemd-nspawn -xD buster
Spawning container buster-7fd47ac79296c5d3 on /var/lib/nspawn-runner/t/.#machine.buster0939fbc61fcbca28.
Press ^] three times within 1s to kill container.
root@buster-7fd47ac79296c5d3:~# mkdir foo
root@buster-7fd47ac79296c5d3:~# ls -la
total 12
drwx------ 1 root root  62 Mar 13 16:30 .
drwxr-xr-x 1 root root 154 Mar 13 16:26 ..
-rw------- 1 root root 102 Mar 13 16:26 .bash_history
-rw-r--r-- 1 root root 570 Mar 13 16:26 .bashrc
-rw-r--r-- 1 root root 148 Mar 13 16:26 .profile
drwxr-xr-x 1 root root   0 Mar 13 16:30 foo
root@buster-7fd47ac79296c5d3:~# logout
Container buster-7fd47ac79296c5d3 exited successfully.
root@runner2:/var/lib/nspawn-runner/t# ls -la buster/root/
totale 12
drwx------ 1 root root  56 mar 13 16:26 .
drwxr-xr-x 1 root root 154 mar 13 16:26 ..
-rw------- 1 root root 102 mar 13 16:26 .bash_history
-rw-r--r-- 1 root root 570 mar 13 16:26 .bashrc
-rw-r--r-- 1 root root 148 mar 13 16:26 .profile
It also works on a directory that is not a subvolume, making reflinks of its contents instead of a subvolume snapshot, although this has a performance penalty on setup: Snapshotting a subvolume:
# time systemd-nspawn -xD buster ls
Spawning container buster-7ab8f4123420b5d5 on /var/lib/nspawn-runner/t/.#machine.bustercd54ef4971229ff5.
Press ^] three times within 1s to kill container.
bin  boot  dev  etc  home  lib  lib32  lib64  libx32  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
Container buster-7ab8f4123420b5d5 exited successfully.
real    0m0,164s
user    0m0,032s
sys 0m0,014s
Reflink-ing a subdirectory:
# time systemd-nspawn -xD buster ls
Spawning container buster-ebc9dc77db0c972d on /var/lib/nspawn-runner/.#machine.buster2ecbcbd1a1a058b8.
Press ^] three times within 1s to kill container.
bin  boot  dev  etc  home  lib  lib32  lib64  libx32  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
Container buster-ebc9dc77db0c972d exited successfully.
real    0m3,022s
user    0m0,326s
sys 0m2,406s
Detecting filesystem type I can change nspawn-runner to use btrfs-specific features only when /var/lib/nspawn-runner is on btrfs. Here's a command to detect the filesystem type:
# stat -f -c %T /var/lib/nspawn-runner/
btrfs
nspawn-runner updated I've refactored nspawn-runner splitting backend and frontend code, and implementing multiple backends based on what's the filesystem type of /var/lib/nspawn-runner/. It works really nicely, and with no special configuration required: if /var/lib/nspawn-runner is on btrfs, things run faster, with less kludges, and one can do maintenance on the base chroots without interfering with running CI jobs. Next step The next step is making it easier to configure and maintain chroots. For example, it should be possible to maintain a rolling testing or sid chroot without the need to manually log into it to run apt upgrade.

7 March 2021

Enrico Zini: Women in science

Timeline of women in science - Wikipedia
this is a timeline of women in science, spanning from ancient history up to the 21st century. While the timeline primarily focuses on women involved with natural sciences such as astronomy, biology, chemistry and physics, it also includes women from the social sciences (e.g. sociology, psychology) and the formal sciences (e.g. mathematics, computer science), as well as notable science educators and medical scientists. The chronological events listed in the timeline relate to both scientific achievements and gender equality within the sciences.
List of women's firsts - Wikipedia
This is a list of women's firsts noting the first time that a woman or women achieved a given historical feat. A shorthand phrase for this development is "breaking the gender barrier" or "breaking the glass ceiling." Other terms related to the glass ceiling can be used for specific fields related to those terms, such as "breaking the brass ceiling" for women in the military and "breaking the stained glass ceiling" for women clergy. Inclusion on the list is reserved for achievements by women that have significant historical impact.
List of Women in Technology International Hall of Fame inductees - Wikipedia
The Women in Technology International Hall of Fame was established in 1996 by Women in Technology International (WITI) to honor women who contribute to the fields of science and technology.
Women in Chemistry Compound Interest
8 March is International Women s Day. As in previous years, I ve put together another edition of this series looking at underappreciated women from chemistry history.
And some men, too:

28 February 2021

Enrico Zini: Links about privilege

A reality check about the myth of starting from nothing and becoming successful: Evocative scientific research: On the other side of privilege: On reality and representation of reality: Will they stop at nothing? What are they going to want from us next, our blood? Maybe. How old are you?

21 February 2021

Enrico Zini: Software development links

Next time we'll iterate on Himblick design and development, Raspberry Pi 4 can now run plain standard Debian, which should make a lot of things easier and cleaner when developing products based on it. Somewhat related to nspawn-runner, random links somehow related to my feeling that nspawn comes from an ecosystem which gives me a bigger sense of focus on security and solidity than Docker: I did a lot of work on A38, a Python library to deal with FatturaPA electronic invoicing, and it was a wonderful surprise to see a positive review spontaneously appear! : Fattura elettronica, come visualizzarla con python TuttoLogico A beautiful, hands-on explanation of git internals, as a step by step guide to reimplementing your own git: Git Internals - Learn by Building Your Own Git I recently tried meson and liked it a lot. I then gave unity builds a try, since it supports them out of the box, and found myself with doubts. I found I wasn't alone, and I liked The Evils of Unity Builds as a summary of the situation. A point of view I liked on technological debt: Technical debt as a lack of understanding Finally, a classic, and a masterful explanation for a question that keeps popping up: RegEx match open tags except XHTML self-contained tags

14 February 2021

Enrico Zini: Mindustry links

Mindustry is really well made computer game that I enjoyed playing a lot. It is Free Software. Here are two guides to get a deeper idea about details of the game: r/Mindustry - How unattended sector defense works (most effective turrets and such) is a useful explanation I found of what happens in Mindustry 6 when I leave a sector to itself. And #959466 is the RFP (wink, wink!)

7 February 2021

Enrico Zini: Language links

In English In Italiano

31 January 2021

Enrico Zini: Some interesting software bugs

25 January 2021

Enrico Zini: nspawn-runner: support for image selection

.gitlab-ci.yml supports 'image' to allow selecting in which environment the script gets run. The documentation says "Used to specify a Docker image to use for the job", but it's clearly a bug in the documentation, because we can do it with nspawn-runner, too. It turns out that most of the environment variables available to CI runs are also available to custom runner scripts. In this case, the value passed as image can be found as $CUSTOM_ENV_CI_JOB_IMAGE in the custom runner scripts environment. After some experimentation I made this commit that makes every chroot under /var/lib/nspawn-runner available as an image:
# Set up 3 new images for CI jobs:
nspawn-runner chroot-create buster
nspawn-runner chroot-create bullseye
nspawn-runner chroot-create sid
That's it, CI scripts can now use image: buster, image: bullseye or image: sid, as they please. You can manually set up other chroots under /var/lib/nspawn-runner and they'll be automatically available. You can also now choose a default image in config.toml in case the CI script doesn't specify one:
prepare_args = ["--verbose", "prepare", "--default-image=buster"]

24 January 2021

Enrico Zini: Miscellaneous news links

Some curious news. Here's a scam that targets terrorists: The Doomsday Scam. On the general topic of scams, a List of confidence tricks. Then two news involving bread: And a last one about how people needed to rename human genes to work around Excel bugsfeatures:

22 January 2021

Enrico Zini: Polishing nspawn-runner

This post is part of a series about trying to setup a gitlab runner based on systemd-nspawn. I published the polished result as nspawn-runner on GitHub. gitlab-runner supports adding extra arguments to the custom scripts, and I can take advantage of that to pack all the various scripts that I prototyped so far into an all-in-one nspawn-runner command:
usage: nspawn-runner [-h] [-v] [--debug]
                      chroot-create,chroot-login,prepare,run,cleanup,gitlab-config,toml 
                     ...
Manage systemd-nspawn machines for CI runs.
positional arguments:
   chroot-create,chroot-login,prepare,run,cleanup,gitlab-config,toml 
                        sub-command help
    chroot-create       create a chroot that serves as a base for ephemeral
                        machines
    chroot-login        enter the chroot to perform maintenance
    prepare             start an ephemeral system for a CI run
    run                 run a command inside a CI machine
    cleanup             cleanup a CI machine after it's run
    gitlab-config       configuration step for gitlab-runner
    toml                output the toml configuration for the custom runner
optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         verbose output
  --debug               verbose output
chroot maintenance chroot-create and chroot-login are similar to what pbuilder, cowbuilder, schroot, debspawn and similar tools do. They only take a chroot name, and default the rest of paths to where nspawn-runner expects things to be under /var/lib/nspawn-runner. gitlab-runner setup nspawn-runner toml <chroot-name> outputs a snippet to add to /etc/gitlab-runner/config.toml to configure the CI. For example:
$ ./nspawn-runner toml buster
[[runners]]
  name="buster"
  url="TODO"
  token="TODO"
  executor = "custom"
  builds_dir = "/var/lib/nspawn-runner/.build"
  cache_dir = "/var/lib/nspawn-runner/.cache"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.custom]
    config_exec = "/home/enrico/ /nspawn-runner/nspawn-runner"
    config_args = ["gitlab-config"]
    config_exec_timeout = 200
    prepare_exec = "/home/enrico/ /nspawn-runner/nspawn-runner"
    prepare_args = ["prepare", "buster"]
    prepare_exec_timeout = 200
    run_exec = "/home/enrico/dev/nspawn-runner/nspawn-runner"
    run_args = ["run"]
    cleanup_exec = "/home/enrico/ /nspawn-runner/nspawn-runner"
    cleanup_args = ["cleanup"]
    cleanup_exec_timeout = 200
    graceful_kill_timeout = 200
    force_kill_timeout = 200
One needs to remember to set url and token, and the runner is configured. The end, for now This is it, it works! Time will tell what issues or ideas will come up: for now, it's a pretty decent first version. The various prepare, run, cleanup steps are generic enough that they can be used outside of gitlab-runner: feel free to build on them, and drop me a note if you find this useful! Updated: Issues noticed so far, that could go into a new version:

Next.

Previous.